Bagging, Boosting, and C4.5

نویسنده

  • J. Ross Quinlan
چکیده

Breiman's bagging and Freund and Schapire's boosting are recent methods for improving the predictive power of classiier learning systems. Both form a set of classiiers that are combined by voting, bagging by generating replicated boot-strap samples of the data, and boosting by adjusting the weights of training instances. This paper reports results of applying both techniques to a system that learns decision trees and testing on a representative collection of datasets. While both approaches substantially improve predictive accuracy, boosting shows the greater beneet. On the other hand, boosting also produces severe degradation on some datasets. A small change to the way that boosting combines the votes of learned classiiers reduces this downside and also leads to slightly better results on most of the datasets considered.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classifying Unseen Cases with Many

Handling missing attribute values is an important issue for classiier learning, since missing attribute values in either training data or test (unseen) data aaect the prediction accuracy of learned classi-ers. In many real KDD applications, attributes with missing values are very common. This paper studies the robustness of four recently developed committee learning techniques, including Boosti...

متن کامل

oosting, a C4.5

Breiman’s bagging and Freund and Schapire’s boosting are recent methods for improving the predictive power of classifier learning systems. Both form a set of classifiers that are combined by voting, bagging by generating replicated bootstrap samples of the data, and boosting by adjusting the weights of training instances. This paper reports results of applying both techniques to a system that l...

متن کامل

An Empirical Study of Combined Classifiers for Knowledge Discovery on Medical Data Bases

This paper compares the accuracy of combined classifiers in medical data bases to the same knowledge discovery techniques applied to generic data bases. Specifically, we apply Bagging and Boosting methods for 16 medical and 16 generic data bases and compare the accuracy results with a more traditional approach (C4.5 algorithm). Bagging and Boosting methods are applied using different numbers of...

متن کامل

Classifying Unseen Cases with Many Missing Values

Handling missing attribute values is an important issue for classiier learning, since missing attribute values in either training data or test (unseen) data aaect the prediction accuracy of learned classiiers. In many real KDD applications, attributes with missing values are very common. This paper studies the robust-ness of four recently developed committee learning techniques, including Boost...

متن کامل

Stochastic Attribute Selection Committees

Classi er committee learning methods generate multiple classi ers to form a committee by repeated application of a single base learning algorithm. The committee members vote to decide the nal classication. Two such methods, Bagging and Boosting, have shown great success with decision tree learning. They create di erent classi ers by modifying the distribution of the training set. This paper stu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996